risk prediction model
Collapsing ROC approach for risk prediction research on both common and rare variants
Risk prediction that capitalizes on emerging genetic findings holds great promise for improving public health and clinical care. However, recent risk prediction research has shown that predictive tests formed on existing common genetic loci, including those from genome-wide association studies, have lacked sufficient accuracy for clinical use. Because most rare variants on the genome have not yet been studied for their role in risk prediction, future disease prediction discoveries should shift toward a more comprehensive risk prediction strategy that takes into account both common and rare variants. We are proposing a collapsing receiver operating characteristic CROC approach for risk prediction research on both common and rare variants. The new approach is an extension of a previously developed forward ROC FROC approach, with additional procedures for handling rare variants. The approach was evaluated through the use of 533 single-nucleotide polymorphisms SNPs in 37 candidate genes from the Genetic Analysis Workshop 17 mini-exome data set. We found that a prediction model built on all SNPs gained more accuracy AUC = 0.605 than one built on common variants alone AUC = 0.585. We further evaluated the performance of two approaches by gradually reducing the number of common variants in the analysis. We found that the CROC method attained more accuracy than the FROC method when the number of common variants in the data decreased. In an extreme scenario, when there are only rare variants in the data, the CROC reached an AUC value of 0.603, whereas the FROC had an AUC value of 0.524.
Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models
Lübeck, Frederike, Wildberger, Jonas, Träuble, Frederik, Mordig, Maximilian, Gatidis, Sergios, Krause, Andreas, Schölkopf, Bernhard
Cardiovascular disease (CVD) risk prediction models are essential for identifying high-risk individuals and guiding preventive actions. However, existing models struggle with the challenges of real-world clinical practice as they oversimplify patient profiles, rely on rigid input schemas, and are sensitive to distribution shifts. We developed AdaCVD, an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank. In benchmark comparisons, AdaCVD surpasses established risk scores and standard machine learning approaches, achieving state-of-the-art performance. Crucially, for the first time, it addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data. In stratified analyses, it demonstrates robust performance across demographic, socioeconomic, and clinical subgroups, including underrepresented cohorts. AdaCVD offers a promising path toward more flexible, AI-driven clinical decision support tools suited to the realities of heterogeneous and dynamic healthcare environments.
A multi-locus predictiveness curve and its summary assessment for genetic risk prediction
Wei, Changshuai, Li, Ming, Wen, Yalu, Ye, Chengyin, Lu, Qing
With the advance of high-throughput genotyping and sequencing technologies, it becomes feasible to comprehensive evaluate the role of massive genetic predictors in disease prediction. There exists, therefore, a critical need for developing appropriate statistical measurements to access the combined effects of these genetic variants in disease prediction. Predictiveness curve is commonly used as a graphical tool to measure the predictive ability of a risk prediction model on a single continuous biomarker. Yet, for most complex diseases, risk prediciton models are formed on multiple genetic variants. We therefore propose a multi-marker predictiveness curve and provide a non-parametric method to construct the curve for case-control studies. We further introduce a global predictiveness U and a partial predictiveness U to summarize prediction curve across the whole population and sub-population of clinical interest, respectively. We also demonstrate the connections of predictiveness curve with ROC curve and Lorenz curve. Through simulation, we compared the performance of the predictiveness U to other three summary indices: R square, Total Gain, and Average Entropy, and showed that Predictiveness U outperformed the other three indexes in terms of unbiasedness and robustness. Moreover, we simulated a series of rare-variants disease model, found partial predictiveness U performed better than global predictiveness U. Finally, we conducted a real data analysis, using predictiveness curve and predictiveness U to evaluate a risk prediction model for Nicotine Dependence.
Absolute Risk Prediction for Cannabis Use Disorder Using Bayesian Machine Learning
Wang, Tingfang, Boden, Joseph M., Biswas, Swati, Choudhary, Pankaj K.
Introduction: Substance use disorders (SUDs) have emerged as a pressing public health crisis in the United States, with adolescent substance use often leading to SUDs in adulthood. Effective strategies are needed to prevent this progression. To help in filling this need, we develop a novel and the first-ever absolute risk prediction model for cannabis use disorder (CUD) for adolescent or young adult cannabis users. Methods: We train a Bayesian machine learning model that provides a personalized CUD absolute risk for adolescent or young adult cannabis users using data from the National Longitudinal Study of Adolescent to Adult Health. Model performance is assessed using 5-fold cross-validation (CV) with area under the curve (AUC) and ratio of the expected to observed number of cases (E/O). External validation of the final model is conducted using two independent datasets. Results: The proposed model has five risk factors: biological sex, delinquency, and scores on personality traits of conscientiousness, neuroticism, and openness. For predicting CUD risk within five years of first cannabis use, AUC and E/O, computed via 5-fold CV, were 0.68 and 0.95, respectively. For the same type of prediction in external validation, AUC values were 0.64 and 0.75, with E/O values of 0.98 and 1, indicating good discrimination and calibration performances of the model. Discussion and Conclusion: The proposed model is the first absolute risk prediction model for an SUD. It can aid clinicians in identifying adolescent/youth substance users at a high risk of developing CUD in future for clinically appropriate interventions.
A comparative study on feature selection for a risk prediction model for colorectal cancer
Cueto-López, N., García-Ordás, M. T., Dávila-Batista, V., Moreno, V., Aragonés, N., Alaiz-Rodríguez, R.
Background and objective Risk prediction models aim at identifying people at higher risk of developing a target disease. Feature selection is particularly important to improve the prediction model performance avoiding overfitting and to identify the leading cancer risk (and protective) factors. Assessing the stability of feature selection/ranking algorithms becomes an important issue when the aim is to analyze the features with more prediction power. Methods This work is focused on colorectal cancer, assessing several feature ranking algorithms in terms of performance for a set of risk prediction models (Neural Networks, Support Vector Machines (SVM), Logistic Regression, k-Nearest Neighbors and Boosted Trees). Additionally, their robustness is evaluated following a conventional approach with scalar stability metrics and a visual approach proposed in this work to study both similarity among feature ranking techniques as well as their individual stability. A comparative analysis is carried out between the most relevant features found out in this study and features provided by the experts according to the state-of-the-art knowledge. Results The two best performance results in terms of Area Under the ROC Curve (AUC) are achieved with a SVM classifier using the top-41 features selected by the SVM wrapper approach (AUC=0.693) and Logistic Regression with the top-40 features selected by the Pearson (AUC=0.689). Experiments showed that performing feature selection contributes to classification performance with a 3.9% and 1.9% improvement in AUC for the SVM and Logistic Regression classifier, respectively, with respect to the results using the full feature set. The visual approach proposed in this work allows to see that the Neural Network-based wrapper ranking is the most unstable while the Random Forest is the most stable.
Is this model reliable for everyone? Testing for strong calibration
Feng, Jean, Gossmann, Alexej, Pirracchio, Romain, Petrick, Nicholas, Pennello, Gene, Sahiner, Berkman
In a well-calibrated risk prediction model, the average predicted probability is close to the true event rate for any given subgroup. Such models are reliable across heterogeneous populations and satisfy strong notions of algorithmic fairness. However, the task of auditing a model for strong calibration is well-known to be difficult -- particularly for machine learning (ML) algorithms -- due to the sheer number of potential subgroups. As such, common practice is to only assess calibration with respect to a few predefined subgroups. Recent developments in goodness-of-fit testing offer potential solutions but are not designed for settings with weak signal or where the poorly calibrated subgroup is small, as they either overly subdivide the data or fail to divide the data at all. We introduce a new testing procedure based on the following insight: if we can reorder observations by their expected residuals, there should be a change in the association between the predicted and observed residuals along this sequence if a poorly calibrated subgroup exists. This lets us reframe the problem of calibration testing into one of changepoint detection, for which powerful methods already exist. We begin with introducing a sample-splitting procedure where a portion of the data is used to train a suite of candidate models for predicting the residual, and the remaining data are used to perform a score-based cumulative sum (CUSUM) test. To further improve power, we then extend this adaptive CUSUM test to incorporate cross-validation, while maintaining Type I error control under minimal assumptions. Compared to existing methods, the proposed procedure consistently achieved higher power in simulation studies and more than doubled the power when auditing a mortality risk prediction model.
Using Artificial Intelligence To Help Prevent Suicide
It is estimated that over 40,000 Americans committed suicide in 2020. The loss of any life is devastating, but the loss of life due to suicide is exceptionally saddening. Suicide is the primary cause of mortality for Australians aged 15 to 44, taking the lives of almost nine people daily. According to some estimates, suicide attempts happen up to 30 times more often than fatalities. "Suicide has large effects when it happens. It impacts many people and has far-reaching consequences for family, friends, and communities," says Karen Kusuma, a University of New South Wales Ph.D. candidate in psychiatry at the Black Dog Institute, who investigates suicide prevention in adolescents.
Artificial intelligence may improve suicide prevention in the future
The loss of any life can be devastating, but the loss of a life from suicide is especially tragic. Around nine Australians take their own life each day, and it is the leading cause of death for Australians aged 15–44. Suicide attempts are more common, with some estimates stating that they occur up to 30 times as often as deaths. "Suicide has large effects when it happens. It impacts many people and has far-reaching consequences for family, friends and communities," says Karen Kusuma, a UNSW Sydney PhD candidate in psychiatry at the Black Dog Institute, who investigates suicide prevention in adolescents.
The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression
Methods to correct class imbalance, i.e. imbalance between the frequency of outcome events and non-events, are receiving increasing interest for developing prediction models. We examined the effect of imbalance correction on the performance of standard and penalized (ridge) logistic regression models in terms of discrimination, calibration, and classification. We examined random undersampling, random oversampling and SMOTE using Monte Carlo simulations and a case study on ovarian cancer diagnosis. The results indicated that all imbalance correction methods led to poor calibration (strong overestimation of the probability to belong to the minority class), but not to better discrimination in terms of the area under the receiver operating characteristic curve. Imbalance correction improved classification in terms of sensitivity and specificity, but similar results were obtained by shifting the probability threshold instead. Our study shows that outcome imbalance is not a problem in itself, and that imbalance correction may even worsen model performance.
New risk calculator could lead to more successful heart operations
Patients could receive a greater benefit from open heart surgery thanks to a new computer model aimed at helping surgeons to better calculate risk and decide whether it's safe to operate. This project, jointly funded by the British Heart Foundation (BHF) and The Alan Turing Institute, will develop a new platform using machine learning to identify patients who are most likely to have a successful operation. Over 30,000 adult patients are considered for heart surgery every year in the UK, and risk prediction plays a major role in the decision-making process made by doctors and patients. Assessing risk before open heart surgery is crucial due to the potential complications that can arise during and after the operation. To calculate a patient's risk before surgery, heart surgeons currently use models such as the EuroSCORE – but this may overestimate the actual risk, partly due to improvements in patient management since the model was developed.